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BACKGROUND NOISE REDUCTION IN SINUSOIDAL 
BASED SPEECH CODING SYSTEMS 

Background of the Invention 

Speech enhancement involves processing either degraded speech signals or 
clean speech that is expected to be degraded in the future, where the goal of 
processing is to improve the quality and intelligibility of speech for the human 
5 listener. Though it is possible to enhance speech that is not degraded, such as by high 
pass filtering to increase perceived crispness and clarity, some of the most significant 
contributions that can be made by speech enhancement techniques is in reducing noise 
degradation of the signal. The applications of speech enhancement are numerous. 
Examples include correction for room reverberation effects, reduction of noise in 

10 speech to improve vocoder performance and improvement of un-degraded speech for 
people with impaired hearing. The degradation can be as different as room echoes, 
additive random noise, multiplicative or convolutional noise, and competing speakers. 
Approaches differ, depending on the context of the problem. One significant problem 
is that of speech degraded by additive random noise, particularly in the context of a 

1 5 Harmonic Excitation Linear Predictive Speech Coder (HE-LPC). 

The selection of an error criteria by which speech enhancement systems are 
optimized and compared is of central importance, but there is no absolute best set of 
criteria. Ultimately, the selected criteria must relate to the subjective evaluation by a 
human listener, and should take into account traits of auditory perception. An example 

20 of a system that exploits certain perceptual aspects of speech is that developed by 
Drucker, as described in "Speech Processing in a High Ambient Noise Environment", 
IEEE Trans. On AudioElecrtoacoustics, Vol.: Au-16, pp: 165-168, June 1968. Based 
on experimental findings, Drucker concluded that a primary cause for intelligibility 
loss in speech degraded by wide-band noise is confusion between fricatives and 

25 plosive sounds, which is partially due to a loss of short pauses immediately before the 
plosive sounds. Drucker reports a significant improvement in intelligibility after high 
pass filtering the Is! fricative and inserting short pauses before the plosive sounds. 
However, Drucker's assumption that the plosive sounds can be accurately determined 
limits the usefulness of the system. 
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Many speech enhancement techniques take a more mathematical approach, 
which are empirically matched to human perception. An example of a mathematical 
criterion that is useful in matching short time spectral magnitudes, a perceptually 
important characterization of speech, is the mean squared error (MSE). A 

5 computational advantage to using this criteria is that the minimum MSE reduces to a 
linear set of equations. Other factors, however, can make an "optimally small" MSE 
misleading. In the case of speech degraded by narrow-band noise, which is 
considerably less comfortable to listen to than wide-band noise, wide-band noise can 
be added to mask the more unpleasant narrow-band noise. This technique makes the 

10 mean squared error larger. 

The enhancement of speech degraded by additive noise has led to diverse 
approaches and systems. Some systems, like Drucker's, exploit certain perceptual 
aspects of speech. Others have focused on improving the estimate of the short time 
Fourier transfonn magnitude (STFTM), which is perceptually important in 

15 characterizing speech. The phase, on the other hand, may be considered as relatively 
unimportant. 

Because the STFTM of speech is perceptually very important, one approach 
has been to estimate the STFTM of clean speech, given information about the noise 
source. Two classes of techniques have evolved out of this approach. In the first, the 

20 short time spectral amplitude is estimated from the spectrum of degraded speech and 
information about the noise source. Usually, the processed spectrum adopts the phase 
of the spectrum of the noisy speech because phase information is not as important 
perceptually. This first class includes spectral subtraction, correlation subtraction and 
maximum likelihood estimation techniques. The second class of techniques, which 

25 includes Wiener filtering, uses the degraded speech and noise information to create a 
zero-phase filter that is then applied to the noisy speech. As reported by H. L. Van 
Trees in "Detection, Estimation and Modulation Theory", Pt. 1, John Wiley and Sons, 
New York, N.Y. 1968, with Wiener filtering the goal is to develop a filter which can 
be applied to noisy speech to form the enhanced speech. 

30 Turning first to the class concerned with estimation of short time spectral 

amplitude, particularly where spectral subtraction is used, statistical information is 
obtained about the noise source to estimate the STFTM of clean speech. This 
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technique is also known as power spectrum subtraction. Variations of these 
techniques included the more general relation identified by Lim et al in "Enhancement 
and Bandwidth Compression of Noisy Speech' 5 , Proc. of the IEEE, Vol.: 67, No.: 12, 
December 1979, as: 



5 where a and p are parameters that can be chosen. Magnitude spectral subtraction is 
the case where a = 1, and p = 1. A different subtractive speech enhancement 
algorithm was presented by McAulay and Malpass in "Speech Enhancement Using 
Soft Decision Noise Suppression Filter", IEEE Trans, on Acoustics, Speech and 
Signal Processing, Vol.: ASSP-28, No.: 2, pp: 137-145, April 1980. Their method 
10 uses a maximum-likelihood estimate of the noisy speech signal assuming that the 
noise is gaussian. When the enhanced magnitude yields a value smaller than an 
attenuation threshold, however, the spectral magnitude is automatically set to the 
defined threshold. 

Spectral subtraction is generally considered to be effective at reducing the 
15 apparent noise power in degraded speech. Lim has shown however that this noise 
reduction is achieved at the price of lower speech intelligibility (8). Moderate 
amounts of noise reduction can be achieved without significant intelligibility loss, 
however, large amount of noise reduction can seriously degrade the intelligibility of 
the speech. Other researchers have also drawn attention to other distortions which are 
20 introduced by spectral subtraction (5). Moderate to high amounts of spectral 
subtraction often introduce "tonal noise" into the speech. 

Another class of speech enhancement methods exploits the periodicity of 
voiced speech to reduce the amount of background noise. These methods average the 
speech over successive pitch periods, which is equivalent to passing the speech 
25 through an adaptive comb filter. In these techniques, harmonic frequencies are passed 
by the filter while other frequencies are attenuated. This leads to a reduction in the 
noise between the harmonics of voiced speech. One problem with this technique is 
that it severely distorts any unvoiced spectral regions. Typically this problem is 
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handled by classifying each segment as either voiced or unvoiced and then only 
applying the comb filter to voiced regions. Unfortunately, this approach does not 
account for the fact that even at modest noise levels many voiced segments have large 
frequency regions which are dominated by noise. Comb filtering these noise 
5 dominated frequency regions severely changes the perceived characteristics of the 
noise. 

These known problems with current speech enhancement methods have 
generated considerable interest in developing new or improved speech enhancement 
methods which are capable of reducing the substantial amount of noise without 
10 adding noticeable artifacts into the speech signal. A particular application for such 
technique is the Harmonic Excitation Linear Predictive Coding (HE-LPC), although it 
is desirable for such technique to be applicable to any sinusoidal based speech coding 
algorithm. 

The conventional Harmonic Excitation Linear Predictive Coder (HE-LPC) is 

15 disclosed in disclosed in S. Yeldener " A 4 kb/s Toll Quality Harmonic Excitation 
Linear Predictive Speech Coder", Proc. of ICASSP-1999, Phoenix, Arizona, pp: 481- 
484, March 1999, which is incorporated herein by reference. A simplified block 
diagram of the conventional HE-LPC coder is shown in Figure 1. In the illustrated 
HE-LPC speech coder 100, the basic approach for representation of speech signal's is 

20 to use a speech synthesis model where speech is formed as the result of passing' an 
excitation signal through a linear time varying LPC filter that models the 
characteristics of the speech spectrum. In particular, input speech 101 is applied to a 
mixer 105 along with a signal defining a window 102. The mixer output 106 is 
applied to a fast Fourier transform FFT 110, which produces an output 111, and an 

25 LPC analysis circuit 130, which itself produces an output 131 to an LPC-LSF 
transform circuit 140. The LPC-LSF transform circuit 140 combines to act as a linear 
time- varying LPC filter that models the resonant characteristics of the speech spectral 
envelope. The LPC filter is represented by a plurality of LPC coefficients (14 in a 
preferred embodiment) that are quantized in the form of Line Spectral Frequency 

30 (LSF) parameters. The output 131 of the LPC analysis is provided to an inverse 
frequency response unit 150, whose output 151 is applied to mixer 155 along with the 
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output 111 of the FFT circuit 110. The same output XI 1 is applied to a pitch detection 
circuit 120 and a voicing estimation circuit 160. 

In the HE-LPC speech coder, the pitch detection circuit 120 uses a pitch 
estimation algorithm that takes advantage of the most important frequency 

5 components to synthesize speech and then estimate the pitch based on a mean squared 
error approach. The pitch search range is first partitioned into various sub-ranges, and 
then a computationally simple pitch cost function is computed. The computed pitch 
cost function is then evaluated and a pitch candidate for each sub-range is obtained. 
After pitch candidates are selected, an analysis by synthesis error minimization 

10 procedure is applied to choose the most optima! pitch estimate. In this case, the LPC 
residual signal is low pass filtered first and then the low pass filter excitation signal is 
passed through an LPC synthesis filter to obtain the reference speech signal. For each 
candidate of pitch, the LPC residual spectrum is sampled at the harmonics of the 
corresponding pitch candidate to get the harmonic amplitude and phases. These 

15 harmonic components are used to generated a synthetic excitation signal based on the 
assumption that the speech is purely voiced. This synthetic excitation signal is then 
passed through the LPC synthesis filter to obtain the synthesized speech signal. The 
perceptually weighted mean squared error (PWMSE) in between the reference and 
synthesized signal is then computed and repeated for each candidate of pitch. The 

20 candidate pitch period having the least PWMSE is then chosen as the most optimal 
pitch estimate P. 

Also significant to the operation of the HE-LPC is the computation of the 
voicing probability that defines a cut-off frequency in voicing estimation circuit 160. 
First, a synthetic speech spectrum is computed based on the assumption that speech 

25 signal is fully voiced. The original and synthetic speech signals are then compared 
and a voicing probability is computed on a harmonic-by-harmonic basis, and the 
speech spectrum is assigned as either voiced or unvoiced, depending on the 
magnitude of the error between the original and reconstructed spectra for the 
corresponding harmonic. The computed voicing probability Pv is then applied to a 

30 spectral amplitude estimation circuit 170 for an estimation of spectral amplitude A k 
for the k lh harmonic. A quantize and encoder unit 180 receives the pitch detection 
signal P, the noise residual in the amplitude, the voicing probability Pv and the 
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spectral amplitude A k , along with the output Isfj of the LPC-LCF transform 140 to 
generate an encoded output speech signal for application to the output channel 181, 

In other coders to which the invention would apply, the excitation signal 
would also be specified by a consideration of the fundamental frequency, spectral 

5 amplitudes of the excitation spectrum and the voicing information. 

At the decoder 200, as illustrated in Fig. 2, the transmitted signal is 
deconstructed into its components lsfj, P and Pv. Specifically, signal 201 from the 
channel is input to a decoder 210, which generates a signal lsfj for input to a LSF-LPC 
transform circuit 220, a pitch estimate P for input to voiced speech synthesis circuit 

10 240 and a voicing probability P v , which is applied to voicing control circuit 250. The 
voicing control circuit provides signals to synthesis circuits 240 and 260 via inputs 
251 and 252. The two synthesis circuits 240 and 260 also receive the output 231of an 
amplitude enhancing circuit 230, which receives an amplitude signal Ak from the 
decoder 210 at its input. 

15 The voiced part of the excitation signal is determined as the sum of the 

sinusoidal harmonics. The unvoiced part of the excitation signal is generated by 
weighting the random noise spectrum with the original excitation spectrum for the 
frequency regions determined as unvoiced. The voiced and unvoiced excitation 
signals are then added together at mixer 270 and passed through an LPC synthesis 

20 filter 280, which responds to an input from the LPC-LSF transform 220 to form the 
final synthesized speech. At the output, a post-filter 290, which also receives an input 
from the LSF-LPC transform circuit 220 via an amplifier 225 with a constant gain a is 
used to further enhance the output speech quality. This arrangement produces high 
quality speech. 

25 However, the conventional arrangement of HE-LPC encoder and decoder does 

not provide the desired performance for a variety of input signal and background 
noise conditions. Accordingly, there is a need for a further way to improve speech 
quality significantly in background noise conditions. 
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Summary of the Invention 

The present invention comprises the reduction of background noise in a 
processed speech signal prior to quantization and encoding for transmission on an 
output channel. 

5 More specifically, the present invention comprises the application of an 

algorithm to the spectral amplitude estimation signal generated in a speech codec on 
the basis of detected pitch and voicing information for reduction of background noise. 

The present invention further concerns the application of a background noise 
algorithm on the basis of individual harmonics k in a spectral amplitude estimated 
10 signal A k in a speech codec. 

The present invention more specifically concerns the application of a 
background noise elimination algorithm to any sinusoidal based speech coding 
algorithm, and in particular, an algorithm based on harmonic excitation linear 
predictive encoding. 
15 Brief Description of the Drawings 

Figure 1 is a block diagram of a conventional HE-LPC speech encoder. 
Figure 2 is a block diagram of a conventional HE-LPC speech decoder. 
Figure 3 is a block diagram of a HE-LPC speech encoder in accordance with 
the present invention. 

20 Figure 4 is a block diagram detailing an implementation of a preferred 

embodiment of the invention. 

Figure 5 is a flow chart illustrating a method for achieving background noise 
reduction in accordance with the present invention. 

Description of The Preferred Embodiment 
25 The preferred embodiment of the present invention can be best appreciated by 

considering in Figure 3 the modifications that are made to the HE-LPC encoder that 
was illustrated in Figure 1. The same reference numbers from Figure 1 are used for 
those components in Figure 3 that are identical to those utilized in the basic block 
diagram of the conventional circuit illustrated in Figure 1. The operation of the 
30 components, as described therein, are identical. The notable addition in the improved 
HE-LPC encoder 300 circuit over the encoder 100 of Figure 1 is the background noise 
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reduction algorithm 3 10. The pitch signal P from the pitch detection circuit 120, tile - 
voicing probability signal Pv from the voicing estimation circuit 160, the spectral 
amplitude estimation signal A k from the spectral amplitude estimation circuit 170 as 
well as the output of the LPC-LSF circuit 140 are all received by the background 
5 noise reduction algorithm 3 10. The output of that algorithm A k (hat) 3 1 1 is input to 
the quantize and encode circuit 180, along with signals P, Pv and A k for generation of 
the output signal 381 for transmission on the output channel. The processing of the 
signal A k in order to reduce the effect of background noise provides a significantly 
improved and enhanced output onto the channel, which can then be received and 
10 processed in the conventional HE-LPC decoder of Figure 2, in a manner already 
described. 

In considering the detailed operation of the background noise-compensating 
encoder of the present invention, reference is made to Figures 4 and 5, which illustrate 
the functional block diagram and flowchart of the algorithm that provides the 
15 enhanced performance. The algorithm processes the pitch Po, as computed during the 
encoding process, and an auto-correlation function ACF, which is a function of the 
energy of the incoming speech as is well known in the art. 



20 block 410 is based on the periodicity Po and the auto-correlation function ACF of the 
speech signal, which appear as inputs on lines 401 and 405, respectively, of Fig. 4, 
The VAD decision is a 1 if a voice signal is over a given threshold (speech is present) 
and 0 if it is not over the threshold (speech is absent). If speech is present, there is 
noise gain control implemented in step S7 , as subsequently discussed. 

25 If the VAD decision is that there is no speech, in step S2, the noise spectrum is 

updated every speech segment where speech is not active, and a long term noise 
spectrum is estimated in noise spectrum estimation unit 420. The long term average 
noise spectrum is formulated as (2): 



The first step SI of the speech enhancement process is to have a voice activity 
detection (VAD) decision for each frame of speech signal. The VAD decision in 
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where 0 <_ co < n, | N m (a>) | is the long term noise spectrum magnitude, a is a constant 
that is can be set to 0.95, and VAD = 0 means that speech is not active. In this 
formulation [U((o)| can be formed by two ways. In the first way, |U(co)| can be 
considered to be directly the current signal spectrum. In the second case, harmonic 
spectral amplitudes are first estimated according to equation (3) as: 



^ E i*MI 2 (3) 



where Ak is the k Ul harmonic spectral amplitude, and coo is the fundamental frequency 
of the current signal, |S(co)j, which is an input to the noise spectrum estimation circuit 
320 along with the pitch P 0 . Notably, S(co) and Po are inputs to each of the VAD 
decision circuit 410, noise spectrum estimation unit 420, harmonic-by harmonic 
noise-signal ratio unit 430 and the harmonic noise attenuation factor unit 460, as 
subsequently discussed. 

In step S3, the Estimated Noise to Signal Ratio (ENSR) for each harmonic 
lobe is calculated on the basis of S(w), excitation spectrum and ptich input.. In this 
case, the ENSR for the k lh harmonic is computed as: 

E \^)w k {ut 

lk ~Bl (7) 

E [%)^H 3 

. where y k is the k th ENSR, N m (rn}(co) is the estimated noise spectrum, S(co) is the 
speech spectrum and W k (co) is the window function computed as: 

W*H = 0.52 - (0.48 cos ; B£<o,<i#. (8) 

where B k L and B k v are the lower and upper limits for the k ,h harmonic and computed 
as: 

Bt = ( k ~\)"o (9) 
Btr = (10) 



9 



WO 01/59766 PCT/US0 1/04526 

In step S4, long term average ACF is calculated section 440, 'using ' ail - AGF - 
autocorrelation function, and on the basis of an input of the VAD decision in section 
410, an input is provided to noise reduction control circuit 450, which in step S5 is 
used to control the noise reduction gain, p m , from one frame to the next one: 



f An-l 
\ An-1 



+ A, if VAD = 1; 

- A, otherwise. ^ ' 



where A is a constant (typically A = 0.1) and 

10 



_ j -0, if An>X.0; 
Pm 1 min, if 0 m <min; W 



1 5 where min is the lowest noise attenuation factor (typically, min = 0.5). 

In step S5, a harmonic-by-harmonic noise-signal ratio is calculated in. section 
430 and the harmonic spectral amplitudes are interpolated according to equation (4) to 
have a fixed dimension spectrum as: 

U(u) = A k + [Am ~ A k (i)] (U ~J^ o) ; fa* S« <(* + !>*. ( 4 ) 

where 1 <_ k < L and L is the total number of harmonics within the 4 kHz speech 
band. The noise gain control that is calculated in step S7, on the basis of the VAD 
decision output 1 and 0, and as represented in the block 450 of Fig. 4, is used as an 
25 input to the computation of the noise attenuation factor in step S5. Specifically, in 
step S5, the noise attenuation factor for each harmonic is calculated as: 

<*k=0mlj (1.0 -VYk) (11) 

30 

In this case, if a k < 0.1, then ctt is set to 0.1. Here, p. is a constant factor that can be 
set as: 
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(4.0, if £ m > 10000.0; 
3.0, if > 3700.0; ( 12 ) 
2.5, otherwise. 



where E m is the long term average energy that can be computed as: 



E m = aE m ^ + (1.0 - a)Eo ^ 



10 where a is a constant factor (typically a = 0.95) and E 0 is the average energy of the 
current frame of the speech signal. 

The noise attenuation factor for each harmonic that was computed in step S5 is 
used in step S6 to scale the harmonic amplitudes that are computed during the 
encoding process of HE-LPC coder, and to attenuate noise in the residual spectral 

15 amplitudes Ak, and produce the modified spectral amplitudes A k (hat). 

The background noise reduction algorithm discussed above may be 
incorporated into the Harmonic Excitation Linear Predictive Coder (HE-LPC), or any 
other coder for a sinusoidal based speech coding algorithm. 

The decoder as illustrated in Fig. 2, may be used to decode a signal encoded 

20 according to the principles of the present invention, as for decoding a signal processed 
by the conventional encoder, the voiced part of the excitation signal is determined as 
the sum of the sinusoidal harmonics. The unvoiced part of the excitation signal is 
generated by weighting the random noise spectrum with the original excitation 
spectrum for the frequency regions determined as unvoiced. The voiced and unvoiced 

25 excitation signals are then added together to form the Final synthesized speech. At the 
output, a post-filter is used to further enhance the output speech quality. 

While the present invention is described with respect to certain preferred 
embodiments, the invention is not limited thereto. The full scope of the invention is to 
be determined on the basis of the issued claims, as interpreted in accordance with 
applicable principles of the U. S. Patent Laws. 
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I claim: 

1 . A speech codec comprising: 

an input for receiving a speech signal; 

a linear time varying LPC filter that models the characteristics of the speech 
spectrum; 

a pitch detection section for generating an estimate of optimal pitch in the 
received speech; 

a voicing estimation section for computing the voicing probability that defines 
a cutoff frequency; 

spectral amplitude estimation section, responsive to the output of the pitch 
detection section and the voicing estimation section for generating an amplitude 
estimation for each harmonic; and 

a background noise generation section responsive to the output of said pitch 
detection section and voicing estimation section for modifying the amplitude 
estimation for each harmonic from said spectral amplitude estimation section. 

2. A speech codec, as claimed in claim 1, wherein said background noise 
generation section comprises: 

voice activity detection section responsive to periodicity and an 
autocorrelation function; 

a noise spectrum estimation section, responsive to the detection of voice 
activity and said pitch detection section for estimating the noise spectrum of said 
speech signal; 

a section responsive to said estimated noise spectrum and said pitch detection 
section and being operative to calculate harmonic by harmonic noise-signal ratio; 

a noise reduction control section for generating a noise control signal in 
response to an auto correlation function; and 

a harmonic noise attenuation factor section, responsive to said pitch detection 
section, said noise reduction control section and said auto correlation function for 
modifying said speech spectrum signal to provide a noise reduced output. 
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3. The speech codec, as claimed in claim 2 } wherein said noise spectrum 
estimation section is operative to generate a long term average noise spectrum as: 

I'M")! - | jjy^^wjt, otherwise. 



where 0 <_ co < n, I N m (co) | is the iong term noise spectrum magnitude, a is a constant 
that is can be set to 0.95, and VAD = 0 means that speech is not active. 

4. The speech codec, as claimed in claim 3, wherein U(<b) is one of the current 
signal spectrum and a harmonic spectral amplitude calculated as: 



\* £ W«>l' (3) 



jwhere A k is the k th harmonic spectral amplitude, and a> 0 is the fundamental frequency 
of the current signal, |S(<o)|. 

and interpolated to have a fixed dimension spectrum as : 

u(») = A k HAku-M(i)) { ~^ ; fc^< w <(fc + iH. (4), 

where 1 <_ k < L and L is the total number of harmonics within a speech band. 

5. The speech codec as claimed in claim 2 wherein said voice activity detection 
section controls noise reduction gain frame by frame. 
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6. The speech codec as claimed in claim 2 wherein an attenuation factor for each 
harmonic is computed on the basis of estimated noise to signal ratio (ENSR) for each 
harmonic lobe. 

7. The speech codec as claimed in claim 6, wherein the ENSR for the kth 
harmonic is computed as: 

E \N m {w)W k (u)\* 

bI W 

E [s{"Wk{u)f 



where yk is the k Ul ENSR, N m (m}(co) is the estimated noise spectrum, S(co) Fs the 
speech spectrum and W k (co) is the window function computed as: 



(8) 



W k {u) « 0.52 - (0.48 cos (ffiljff) j B k L < u < jfc. 
where B k L and 8$ are the lower and upper limits for the harmonic and computed as: 



(9) 
(10) 



where co 0 is the fundamental frequency of the corresponding speech sequence. 



8. The speech codec, as claimed in claim 6, wherein the noise attenuation factor 
for each harmonic is used to scale computed harmonic amplitudes. 

9. The speech codec, as claimed in claim 2, further comprising a LPC filter that 
models the characteristics of the speech spectrum, said filter being represented by a 
plurality of line spectral frequency parameters. 
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10. A method of correcting for background noise in a speech codec comprising: 

detect voice activity for each frame of speech signal, based on the periodicity 
P 0 and the auto-correlation function ACF of the speech signal; 

update the noise spectrum every speech segment where speech is not active, 
and estimate a long term noise spectrum; 

calculate a hannonic-by-harmonic noise-signal ratio and interpolate the 
harmonic spectral amplitudes; 

calculate long term average ACF and on the basis of an input of the detected 
voice activity provide an input to control the noise reduction gain, p m , from one frame 
to the next one; 

compute an attenuation factor for each harmonic based on the Estimated Noise 

to Signal Ratio (ENSR) for each harmonic lobe; 

calculate a noise attenuation factor for each harmonic; and 

apply the noise attenuation factor to scale the harmonic amplitudes that are 

computed during the encoding process. 

1 1 . The method of claim 1 0 wherein the updating step is performed on the 
basis of U(w) being the current signal spectrum. 

12. The method of claim 10 wherein the updating step is performed on the 
basis of an estimation of the spectral amplitudes as : 



13. The method of claims 1 1 wherein the harmonic spectral amplitudes are 
interpolated to have a fixed dimension spectrum. 

14. The method of claims 12 wherein the harmonic spectral amplitudes are 
interpolated to have a fixed dimension spectrum. 




(3) 
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15. The method of claim 13 wherein the fixed dimension spectrum is 
defined as 



16. The method of claim 14 wherein the fixed dimension spectrum is 
defined as 



WO 01/59766 



PCT/US0 1/04526 




SUBSTITUTE SHEET {RULE 26) 



WO 01/59766 



PCT/US01/04526 



2/5 




CM 

^2 



o 



SUBSTITUTE SHEET (RULE 26) 



WO 01/59766 



PCT/USO 1/04526 




SUBST!'* UTE H E" s'R'JLF 26) 



WO 01/59766 



PCT/US0 1/04526 



4/5 





SUBSTITUTE SHEET (RULE 28} 



WO 01/59766 



PCT/US0 1/04526 



5/5 



VAD FLAG 




ACF 

AUTOCORRELATION 
FUNCTION 



SPECTRAL AMPLITUDES (Ak) 



MODIFIED SPECTRAL 
AMPLITUDES (Ak) 



FiG.5 

SUBSTITUTE SHEET (RULE 28) 



INTERNATIONAL SEARCH REPORT 


International applicat 


on No. 




PCT7US01/04526 





A. CLASSIFICATION OF SUBJECT MATTER 

IPC{7) : G10L 21/02 
US CL : 704/226 
According to Internationa] Patera Classification (IPC) or to hoth national classification and IPC 



B. FIELDS SEARCHED 

Minimum documentation searched (classification system followed by classification symbols) 
U.S. : 704/226, 223, 224, 227, 228, 230, 233, 207, 205. 209, 267, 268, 265, 219 



Documentation searched other than minimum documentation to the extent that such documents are included in the fields searched 
NONE 



Electronic data base consulted during the international search (name of data base and, where practicable, search terms used) 
Please See Continuation Sheet 



C. DOCUMENTS CONSIDERED TO BE RELEVANT 



Category * 


Citation of document, with indication, where appropriate, of the relevant passages 


Relevant to claim No. 


Y 


US 4,937,873 A (McAULAY ec al.) 26 June 1990 (26.06. 1990), abstract, Figs.4,5, 
Col. 7, line 31 - Col. 10, line 30. 


1, 10 


Y 


US 5,054,072 A (McAULAY et al.) 01 October 1991 (01.10.199!), Fig. 1, Col. 4, line 33 
-Col. 11. line 68). 


MS 


Y 


US 5,664,051 A (HARDWICK et al.) 02 Sepetmber 1997 (02.09. 1997), abstractCol.4, 
line 11 -Col.6, line 51). 


1-15 


Y,P 


US 6,070,137 A (BLOEBAUM et al.) 30 May 2000 (30.05.2000), abstract, Figs. 3-5, 
Coi.4, line 16-Col.S, line 20. 


1-15 


Y,P 


US 6,182,033 Bl (ACCARD1 et at.) 30 January 2001 (30.01.2001), Col.3, line 66 - 


1, to 



I 1 Further documents are listed in the continuation of Box C. Q See patent family a 



» 1 i 1 < > 

"A" dixumeal defining rhg general siaie 0 



101 in connta with the application liui cited to 
ar iheory underlying the invention 



1 claim's) or wiicb a cried 10 



referring so an oral disclosure, use. exhibition 



Dalt 1 hi «i 1 1 i ! r 1 1 1 frti i 1 al search. 
3! March 2001 (31.03.2001) 


Date of mailing of the in 


ternStional search report 


Name and mailii g add t the ISA/US 

PCT 

Facsimile No. {703)305-3230 


Authorized officer 
William R. Korzucb 
Telephone No, (703) 1 


ml fikn 

5-4700 



Form PCT/ US A/2 I (seco d iheei) (July 1998) 



INTERNATIONAL SEARCH REPORT 



Internattonal applicE 
PCT/US01/04526 



Continuation of B. FIELDS SEARCHED Item 3: WEST, Smart Patents, I EI. Online 

Search terms: Background noise, speech, linear prediction, LSF, LPC, quantization, speech detection, STFTM, harmonics, pitch, 
probability, estimation, periodicity, pitch 



Form PCT/ ISA/211) (extra sheet) (July 



